TR 09 - 002 Mixed - Membership Naive Bayes Models

نویسندگان

  • Arindam Banerjee
  • Hanhuai Shan
چکیده

In recent years, mixture models have found widespread usage in discovering latent cluster structure from data. A popular special case of finite mixture models are naive Bayes models, where the probability of a feature vector factorizes over the features for any given component of the mixture. Despite their popularity, naive Bayes models suffer from two important restrictions: first, they do not have a natural mechanism for handling sparsity, where each data point may have only a few observed features; and second, they do not allow objects to be generated from different latent clusters with varying degrees (i.e., mixed-memberships) in the generative process. In this paper, we first introduce marginal naive Bayes (MNB) models, which generalize naive Bayes models to handle sparsity by marginalizing over all missing features. More importantly, we propose mixed-membership naive Bayes (MMNB) models, which generalizes (marginal) naive Bayes models to allow for mixed memberships in the generative process. MMNB models can be viewed as a natural generalization of latent Dirichlet allocation (LDA) with the ability to handle heterogenous and possibly sparse feature vectors. We propose two variational inference algorithms to learn MMNB models from data. While the first exactly follows the corresponding ideas for LDA, the second uses much fewer variational parameters leading to a much faster algorithm with smaller time and space requirements. An application of the same idea in the context of topic modeling leads to a new Fast LDA algorithm. The efficacy of the proposed mixed-membership models and the fast variational inference algorithms are demonstrated by extensive experiments on a wide variety of different datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...

متن کامل

Diagnosis of Pulmonary Tuberculosis Using Artificial Intelligence (Naive Bayes Algorithm)

Background and Aim: Despite the implementation of effective preventive and therapeutic programs, no significant success has been achieved in the reduction of tuberculosis. One of the reasons is the delay in diagnosis. Therefore, the creation of a diagnostic aid system can help to diagnose early Tuberculosis. The purpose of this research was to evaluate the role of the Naive Bayes algorithm as a...

متن کامل

Visual Saliency Detection for RGB-D Images with Generative Model

In this paper, we propose a saliency detection model for RGB-D images based on the contrasting features of colour and depth with a generative mixture model. The depth feature map is extracted based on superpixel contrast computation with spatial priors. We model the depth saliency map by approximating the density of depth-based contrast features using a Gaussian distribution. Similar to the dep...

متن کامل

Semi-supervised Self-training for Sentence Subjectivity Classification

Recent natural language processing (NLP) research shows that identifying and extracting subjective information from texts can benefit many NLP applications. In this paper, we address a semi-supervised learning approach, self-training, for sentence subjectivity classification. In self-training, the confidence degree that depends on the ranking of class membership probabilities is commonly used a...

متن کامل

Reducing multiclass to binary by coupling probability estimates

This paper presents a method for obtaining class membership probability estimates for multiclass classification problems by coupling the probability estimates produced by binary classifiers. This is an extension for arbitrary code matrices of a method due to Hastie and Tibshirani for pairwise coupling of probability estimates. Experimental results with Boosted Naive Bayes show that our method p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009